In the context of escalating global environmental challenges, the shift from traditional fossil fuel-based energy sources to renewable energy has become a focal point in efforts to reduce carbon emissions and combat climate change (Kabeyi et. al, 2022). However, the transition’s broader environmental impacts, particularly on air quality, remain less explored. This transition is especially relevant given the increasing global energy demand and the need to meet this demand sustainably.
Understanding the relationship between renewable energy generation and air quality is crucial. Renewable energy sources like wind and solar are lauded for their lower environmental impact compared to fossil fuels, which are major contributors to air pollution (UCSUSA, 2018). Air pollution is a significant environmental hazard, affecting human health, ecosystems, and the climate. It is responsible for millions of premature deaths annually and contributes to the occurrence of diseases like asthma, heart disease, and lung cancer (National Geographic, 2023). Therefore, assessing how increased renewable energy generation affects air quality indicators is not just an environmental concern but a public health imperative.
The hypothesis underlying this research is that an increase in renewable energy generation leads to a reduction in air pollution. This hypothesis is grounded in the understanding that renewable energy sources, unlike fossil fuels, do not emit pollutants like sulfur dioxide, nitrogen oxides, and particulate matter during electricity generation.
1: What is the relationship between distributed renewable energy generation and the level of air pollution?
This question aims to investigate the correlation between the rise in renewable energy generation and the concentrations of various air pollutants. It seeks to understand whether regions with higher renewable energy output exhibit lower levels of air pollutants.
2: Among air quality indicators (PM10, PM2.5, CO, NO2, and SO2), which display the most significant response to variations in energy generation?
This question delves deeper into identifying which specific pollutants are most responsive to changes in energy generation types. It is crucial for pinpointing the environmental benefits of renewable energy sources and for policy-making aimed at targeted air pollution reduction.
The exploratory analysis required the combination of air quality and power plant datasets. Air quality data in the analysis was obtained from the United States Environmental Protection Agency while power plant data was obtained from the U.S. Energy Information Administration. The sample years were the two decades namely 2001 - 2021.
| Dataset | Source | Variables Used |
|---|---|---|
| Air Quality Summary Statistics by Criteria Pollutants and Location | EPA Air Quality System (AQS) | Monthly Mean Ozone and PM2.5 |
| Power Plant Generator Level Capacities and Locations | EIA Form EIA-860 | Annual Installed Generation Capacity by Fuel Type |
| Power Plant Monthly Energy Generation | EIA Form EIA-923 | Monthly Net Generation by Fuel Type |
We began our exploratory analysis by examining if and how solar and wind energy generating capacity, net generation and air quality has changed over time in the contiguous United States. To begin to visualize this, we used the wrangled power plant to visualize both quantitatively and spatially the change in total solar and wind plant installed capacity over the period 2001 - 2021. From the exploratory line plot, we note the three states with the highest cumulative installed capacity include California, Texas and Iowa which have significant growth in installed capacity compared to other states. To visualize the energy generation associated with these installations, we also plotted the annual energy generation from solar and wind over a similar period and noted that the states with the largest installed capacity are also the states with the highest annual energy generation from these renewable sources.
The change in installed solar and wind plants can be also be visualized spatially across the contiguous United States as illustrated below. From the map, we note that the number of installed solar and wind plants increased significantly over the two decades from 2001 - 2021.
Narrowing down on the top three states with the highest growth in installed capacity, we used ggplot and gganimate to visualize the increase and distribution of plants within the state as shown below:
After establishing that installed capacity of solar and wind plants has increased over time in the United States and particularly in California, Texas and Iowa, we began to explore air quality data to establish the changes in the concentrations of key criteria pollutants over time. To facilitate this, we visualized the wrangled air quality data over time for key pollutants related to fossil fuel generation including SO2, NOX, and PM2.5. Based on the outputs shown below, it can be observed that the amounts of pollutants measured has been trending downwards over time.
What is the relationship between distributed renewable energy generation and the level of air pollution?
We can formulate a null and alternative hypothesis for the above research question as follows: H0: There is no change in recorded air quality with an increase in renewable energy generation in the states of California, Texas and Iowa over the period 2001 - 2021. Ha: There is a change in recorded air quality with an increase in renewable energy generation in the states of California, Texas and Iowa over the period 2001 - 2021.
To evaluate this hypothesis, we generated a plot of Mean PM2.5 measured against net monthly solar and wind generation for all three states.
The figures above suggest that the measured value of the pollutants has an inverse relationship or negative correlation with net generation from solar. This implies that the higher the amount of energy generated from wind and solar, the lower the amount of the three criteria pollutants. To investigate this further, we performed a simple linear regression of the relationship between the mean quantity of each pollutant and net energy generation with the results summarized in the table outlined below:
##
## Call:
## lm(formula = MeanPM25 ~ NetGeneration, data = df_CA.energy.air.data.PM2.5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3063 -2.3934 -0.9329 1.0735 24.1288
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.200e+01 3.343e-01 35.905 < 2e-16 ***
## NetGeneration -7.729e-07 1.575e-07 -4.907 1.67e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.675 on 250 degrees of freedom
## Multiple R-squared: 0.08784, Adjusted R-squared: 0.08419
## F-statistic: 24.07 on 1 and 250 DF, p-value: 1.672e-06
##
## Call:
## lm(formula = MeanPM25 ~ NetGeneration, data = df_TX.energy.air.data.PM2.5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1238 -1.1574 -0.1915 1.1334 7.2755
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.079e+01 1.613e-01 66.883 < 2e-16 ***
## NetGeneration -2.771e-07 3.781e-08 -7.329 3.2e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.746 on 250 degrees of freedom
## Multiple R-squared: 0.1769, Adjusted R-squared: 0.1736
## F-statistic: 53.72 on 1 and 250 DF, p-value: 3.197e-12
##
## Call:
## lm(formula = MeanPM25 ~ NetGeneration, data = df_IA.energy.air.data.PM2.5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1263 -1.7642 -0.1452 1.1926 10.7097
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.111e+01 2.235e-01 49.733 < 2e-16 ***
## NetGeneration -1.283e-06 1.557e-07 -8.239 9.78e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.405 on 250 degrees of freedom
## Multiple R-squared: 0.2136, Adjusted R-squared: 0.2104
## F-statistic: 67.88 on 1 and 250 DF, p-value: 9.784e-15
##
## Call:
## lm(formula = MeanSO2 ~ NetGeneration, data = df_CA.energy.air.data.SO2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.60143 -0.27853 -0.06666 0.23838 1.28392
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.233e+00 3.179e-02 38.78 <2e-16 ***
## NetGeneration -2.121e-07 1.498e-08 -14.16 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3494 on 250 degrees of freedom
## Multiple R-squared: 0.4451, Adjusted R-squared: 0.4429
## F-statistic: 200.6 on 1 and 250 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = MeanSO2 ~ NetGeneration, data = df_TX.energy.air.data.SO2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.86002 -0.43834 -0.00575 0.30639 1.92076
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.349e+00 4.912e-02 27.468 < 2e-16 ***
## NetGeneration -9.488e-08 1.152e-08 -8.238 9.88e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5318 on 250 degrees of freedom
## Multiple R-squared: 0.2135, Adjusted R-squared: 0.2103
## F-statistic: 67.86 on 1 and 250 DF, p-value: 9.879e-15
##
## Call:
## lm(formula = MeanSO2 ~ NetGeneration, data = df_IA.energy.air.data.SO2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.46099 -0.53054 -0.04837 0.46070 2.43395
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.119e+00 6.810e-02 31.11 <2e-16 ***
## NetGeneration -6.421e-07 4.744e-08 -13.54 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7329 on 250 degrees of freedom
## Multiple R-squared: 0.4229, Adjusted R-squared: 0.4206
## F-statistic: 183.2 on 1 and 250 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = MeanNO2 ~ NetGeneration, data = df_CA.energy.air.data.NOX)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9093 -2.2522 0.0044 1.9175 7.2354
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.466e+01 2.632e-01 55.70 <2e-16 ***
## NetGeneration -1.826e-06 1.240e-07 -14.72 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.893 on 250 degrees of freedom
## Multiple R-squared: 0.4644, Adjusted R-squared: 0.4622
## F-statistic: 216.7 on 1 and 250 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = MeanNO2 ~ NetGeneration, data = df_TX.energy.air.data.NOX)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4112 -1.8887 -0.1455 1.6983 5.9992
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.526e+00 2.164e-01 44.025 <2e-16 ***
## NetGeneration -4.695e-07 5.073e-08 -9.254 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.343 on 250 degrees of freedom
## Multiple R-squared: 0.2551, Adjusted R-squared: 0.2522
## F-statistic: 85.63 on 1 and 250 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = MeanNO2 ~ NetGeneration, data = df_IA.energy.air.data.NOX)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6786 -1.4912 -0.1341 1.4061 6.4046
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.393e+00 1.871e-01 44.87 <2e-16 ***
## NetGeneration -1.501e-06 1.295e-07 -11.59 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.988 on 247 degrees of freedom
## Multiple R-squared: 0.3522, Adjusted R-squared: 0.3496
## F-statistic: 134.3 on 1 and 247 DF, p-value: < 2.2e-16
airQualityAIC <- lm(data = df_CA.energy.air.data.PM2.5,
MeanPM25 ~ Month + Year + NetGeneration)
#Choosing a model by AIC in a stepwise algorithm
step(airQualityAIC)
## Start: AIC=640.19
## MeanPM25 ~ Month + Year + NetGeneration
##
## Df Sum of Sq RSS AIC
## - NetGeneration 1 13.932 3110.7 639.32
## <none> 3096.7 640.19
## - Year 1 57.554 3154.3 642.83
## - Month 1 226.376 3323.1 655.97
##
## Step: AIC=639.32
## MeanPM25 ~ Month + Year
##
## Df Sum of Sq RSS AIC
## <none> 3110.7 639.32
## - Month 1 228.85 3339.5 655.21
## - Year 1 362.43 3473.1 665.09
##
## Call:
## lm(formula = MeanPM25 ~ Month + Year, data = df_CA.energy.air.data.PM2.5)
##
## Coefficients:
## (Intercept) Month Year
## 407.3051 0.2761 -0.1980
# Interpretation of the Boxplots #it may be redundant, we can decide if
we want to keep it or not..
PM2.5 Levels: The distribution of PM2.5 levels varies widely between states. California shows a particularly high range of PM2.5 concentrations with notable outliers, indicating episodes of very poor air quality. It’s important to look into the reasons for California’s variability, such as wildfires or urban pollution.
CO Levels: CO levels are relatively uniform across the states, with fewer outliers compared to PM2.5. This could indicate a more consistent source of CO pollution, such as traffic, across these states.
NO2 Levels: NO2 levels are somewhat variable, with Illinois showing a higher median concentration. This could be associated with industrial activities or high traffic density.
PM10 Levels: PM10 shows a spread similar to PM2.5, with California again showing high variability and outliers. This suggests common sources of particulate matter affecting both PM10 and PM2.5.
SO2 Levels: The distribution of SO2 is quite tight in most states, except for a few outliers. This pollutant is often associated with industrial processes and the burning of sulfur-containing fuels.
library(dplyr)
# df_generation_monthly
top_states <- df_generation_monthly %>%
group_by(State) %>%
summarise(TotalGeneration = sum(NetGeneration)) %>%
top_n(10, TotalGeneration) %>%
pull(State)
#Time Series Plot for Renewable Energy Generation
ggplot(df_generation_monthly %>% filter(State %in% top_states),
aes(x = Date, y = NetGeneration, group = State, color = State)) +
geom_line() +
labs(title = "Renewable Energy Generation Over Time in Top 10 States",
x = "Date",
y = "Net Generation (MWh)") +
theme_minimal()
#PM
The line graph above displays the trend of renewable energy generation over time in the top 10 states. We can see that there is a general upward trend in renewable energy generation across all states, indicating increased adoption and capacity over time. California (CA) stands out with a significantly higher generation, especially with a steep increase around 2020. Other states also show growth in renewable energy generation but to varying degrees. For instance, Texas (TX) and Iowa (IA) show notable increases. The variability in generation over time could be influenced by factors like state policies, technological advancements, and investment in renewable energy infrastructure.The overall increasing trend aligns with global efforts to transition to cleaner energy sources to reduce reliance on fossil fuels and combat climate change.
# Check unique states in both datasets
unique(df_generation_monthly$State)
## [1] AK AZ CA CO DE FL HI IA ID IL IN KS MA MD ME MI MN MO MT NC ND NE NH NJ NM
## [26] NV NY OH OK OR PA RI SD TN TX UT VA VT WA WI WV WY AL AR CT DC GA KY LA MS
## [51] SC
## 51 Levels: AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA MA ... WY
unique(df_pollutant_monthly$State)
## [1] Alabama Alaska Arizona
## [4] Arkansas California Colorado
## [7] Connecticut Country Of Mexico Delaware
## [10] District Of Columbia Florida Georgia
## [13] Hawaii Idaho Illinois
## [16] Indiana Iowa Kansas
## [19] Kentucky Louisiana Maine
## [22] Maryland Massachusetts Michigan
## [25] Minnesota Mississippi Missouri
## [28] Montana Nebraska Nevada
## [31] New Hampshire New Jersey New Mexico
## [34] New York North Carolina North Dakota
## [37] Ohio Oklahoma Oregon
## [40] Pennsylvania Puerto Rico Rhode Island
## [43] South Carolina South Dakota Tennessee
## [46] Texas Utah Vermont
## [49] Virginia Washington West Virginia
## [52] Wisconsin Wyoming Canada
## [55] Virgin Islands
## 55 Levels: Alabama Alaska Arizona Arkansas California Canada ... Wyoming
state_abbreviations <- c(AL = "Alabama", AK = "Alaska", AZ = "Arizona", AR = "Arkansas", CA = "California",
CO = "Colorado", CT = "Connecticut", DE = "Delaware", FL = "Florida", GA = "Georgia",
HI = "Hawaii", ID = "Idaho", IL = "Illinois", IN = "Indiana", IA = "Iowa",
KS = "Kansas", KY = "Kentucky", LA = "Louisiana", ME = "Maine", MD = "Maryland",
MA = "Massachusetts", MI = "Michigan", MN = "Minnesota", MS = "Mississippi", MO = "Missouri",
MT = "Montana", NE = "Nebraska", NV = "Nevada", NH = "New Hampshire", NJ = "New Jersey",
NM = "New Mexico", NY = "New York", NC = "North Carolina", ND = "North Dakota",
OH = "Ohio", OK = "Oklahoma", OR = "Oregon", PA = "Pennsylvania", RI = "Rhode Island",
SC = "South Carolina", SD = "South Dakota", TN = "Tennessee", TX = "Texas", UT = "Utah",
VT = "Vermont", VA = "Virginia", WA = "Washington", WV = "West Virginia", WI = "Wisconsin",
WY = "Wyoming")
df_pollutant_monthly$State <- sapply(df_pollutant_monthly$State, function(x) {
name <- state_abbreviations[which(state_abbreviations == x)]
if (length(name) == 0) NA else names(name)
})
df_merged_top_states <- merge(df_generation_monthly %>% filter(State %in% top_states),
df_pollutant_monthly %>% filter(State %in% top_states),
by = c("State", "Date"))
str(df_merged_top_states)
## 'data.frame': 12070 obs. of 8 variables:
## $ State : Factor w/ 51 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ Date : Date, format: "2001-01-01" "2001-01-01" ...
## $ Year : int 2001 2001 2001 2001 2001 2001 2001 2001 2001 2001 ...
## $ Month : int 1 1 1 1 1 2 2 2 2 2 ...
## $ NetGeneration: num 137203 137203 137203 137203 137203 ...
## $ Pollutant : Factor w/ 5 levels "Carbon monoxide",..: 4 3 5 2 1 1 5 2 3 4 ...
## $ Unit : Factor w/ 4 levels "Micrograms/cubic meter (25 C)",..: 2 1 3 3 4 4 3 3 1 2 ...
## $ Mean : num 25.1 33.42 1.72 21.63 1.06 ...
# Creating a list of unique pollutants in our dataset
unique_pollutants <- unique(df_merged_top_states$Pollutant)
# Creating a list to store the plots
scatter_plots <- list()
# Looping through each unique pollutant and creating a scatter plot
for (pollutant in unique_pollutants) {
plot_data <- df_merged_top_states[df_merged_top_states$Pollutant == pollutant, ]
plot <- ggplot(plot_data, aes(x = NetGeneration, y = Mean, color = State)) +
geom_point() +
labs(title = paste("Net Generation vs.", pollutant),
x = "Net Generation",
y = "Pollutant Concentration",
color = "State") +
theme_minimal()
scatter_plots[[pollutant]] <- plot
}
# Printing the scatter plots
scatter_plots
## $`PM2.5 - Local Conditions`
##
## $`PM10 Total 0-10um STP`
##
## $`Sulfur dioxide`
##
## $`Nitrogen dioxide (NO2)`
##
## $`Carbon monoxide`
#Interpretation
PM2.5 and Net Generation: There doesn’t seem to be a clear relationship between net generation and PM2.5 levels. While some states with higher net generation have lower PM2.5 levels, the data is scattered, suggesting other factors may be at play in determining PM2.5 concentrations.
PM10 and Net Generation: Similar to PM2.5, the PM10 concentrations do not show a clear trend in relation to net generation. There is significant scatter across all levels of net generation.
Sulfur Dioxide (SO2) and Net Generation: The plot for SO2 shows a dense clustering of lower SO2 levels at higher net generation, which might indicate that higher renewable energy generation could be associated with lower SO2 concentrations.
Nitrogen Dioxide (NO2) and Net Generation: The scatter plot for NO2 presents a wide distribution of pollutant concentrations across the net generation axis, indicating no strong correlation between the two.
Carbon Monoxide (CO) and Net Generation: The CO levels are spread across the generation axis, but there is a noticeable cluster of lower CO concentrations at higher levels of net generation.
So, in conclusion, upon reviewing the scatter plots for PM2.5, PM10, NO2, and SO2 in relation to renewable energy generation, it is the scatter plots for Sulfur Dioxide (SO2) and Carbon Monoxide (CO) that show some indication of a relationship. For SO2, higher levels of renewable energy generation seem to correspond with a clustering of lower pollutant concentrations. Similarly, the scatter plot for CO also shows a clustering of data points towards lower CO levels as net generation increases. These observations suggest an inverse relationship where increased renewable energy generation could be associated with decreased emissions of SO2 and CO, which are pollutants typically associated with the combustion of fossil fuels.
What is the relationship between distributed renewable energy generation and the level of air pollution?
Among air quality indicators (PM10, PM2.5, CO, NO2, and SO2), which display the most significant response to variations in energy generation?
Kabeyi, Moses Jeremiah Barasa, and Oludolapo Akanni Olanrewaju. ‘Sustainable Energy Transition for Renewable and Low Carbon Grid Electricity Generation and Supply’. Frontiers in Energy Research, vol. 9, 2022. Frontiers, https://www.frontiersin.org/articles/10.3389/fenrg.2021.743114.
Environmental Impacts of Renewable Energy Technologies | Union of Concerned Scientists. https://www.ucsusa.org/resources/environmental-impacts-renewable-energy-technologies. Accessed 6 Dec. 2023.
Air Pollution. https://education.nationalgeographic.org/resource/air-pollution. Accessed 6 Dec. 2023.